Overview

Dataset statistics

Number of variables9
Number of observations5693
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory400.4 KiB
Average record size in memory72.0 B

Variable types

Numeric9

Alerts

gross_revenue is highly correlated with qtt_invoices and 2 other fieldsHigh correlation
recency_days is highly correlated with qtt_invoicesHigh correlation
qtt_invoices is highly correlated with gross_revenue and 4 other fieldsHigh correlation
unique_products is highly correlated with gross_revenue and 1 other fieldsHigh correlation
unique_items is highly correlated with gross_revenue and 3 other fieldsHigh correlation
daily_purchase_rate is highly correlated with qtt_invoices and 1 other fieldsHigh correlation
total_prod_returned is highly correlated with qtt_invoicesHigh correlation
gross_revenue is highly correlated with qtt_invoices and 1 other fieldsHigh correlation
qtt_invoices is highly correlated with gross_revenue and 1 other fieldsHigh correlation
unique_items is highly correlated with gross_revenue and 1 other fieldsHigh correlation
gross_revenue is highly correlated with qtt_invoices and 2 other fieldsHigh correlation
qtt_invoices is highly correlated with gross_revenue and 2 other fieldsHigh correlation
unique_products is highly correlated with gross_revenue and 1 other fieldsHigh correlation
unique_items is highly correlated with gross_revenue and 2 other fieldsHigh correlation
daily_purchase_rate is highly correlated with qtt_invoicesHigh correlation
df_index is highly correlated with customer_id and 1 other fieldsHigh correlation
customer_id is highly correlated with df_index and 1 other fieldsHigh correlation
gross_revenue is highly correlated with qtt_invoices and 3 other fieldsHigh correlation
recency_days is highly correlated with df_index and 1 other fieldsHigh correlation
qtt_invoices is highly correlated with gross_revenue and 3 other fieldsHigh correlation
unique_products is highly correlated with gross_revenue and 2 other fieldsHigh correlation
unique_items is highly correlated with gross_revenue and 3 other fieldsHigh correlation
total_prod_returned is highly correlated with gross_revenue and 2 other fieldsHigh correlation
gross_revenue is highly skewed (γ1 = 23.1789984) Skewed
unique_items is highly skewed (γ1 = 25.19954694) Skewed
total_prod_returned is highly skewed (γ1 = 28.80267821) Skewed
df_index is uniformly distributed Uniform
df_index has unique values Unique
customer_id has unique values Unique
total_prod_returned has 4191 (73.6%) zeros Zeros

Reproduction

Analysis started2022-08-22 21:26:48.797450
Analysis finished2022-08-22 21:27:23.224265
Duration34.43 seconds
Software versionpandas-profiling v3.2.0
Download configurationconfig.json

Variables

df_index
Real number (ℝ≥0)

HIGH CORRELATION
UNIFORM
UNIQUE

Distinct5693
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2894.806605
Minimum0
Maximum5783
Zeros1
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size44.6 KiB
2022-08-22T18:27:23.457951image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile289.6
Q11454
median2897
Q34339
95-th percentile5492.4
Maximum5783
Range5783
Interquartile range (IQR)2885

Descriptive statistics

Standard deviation1668.357134
Coefficient of variation (CV)0.5763276661
Kurtosis-1.196233201
Mean2894.806605
Median Absolute Deviation (MAD)1443
Skewness-0.003582371647
Sum16480134
Variance2783415.527
MonotonicityStrictly increasing
2022-08-22T18:27:23.747060image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
01
 
< 0.1%
38881
 
< 0.1%
38641
 
< 0.1%
38631
 
< 0.1%
38621
 
< 0.1%
38611
 
< 0.1%
38601
 
< 0.1%
38591
 
< 0.1%
38581
 
< 0.1%
38571
 
< 0.1%
Other values (5683)5683
99.8%
ValueCountFrequency (%)
01
< 0.1%
11
< 0.1%
21
< 0.1%
31
< 0.1%
41
< 0.1%
51
< 0.1%
61
< 0.1%
71
< 0.1%
81
< 0.1%
91
< 0.1%
ValueCountFrequency (%)
57831
< 0.1%
57821
< 0.1%
57811
< 0.1%
57801
< 0.1%
57791
< 0.1%
57781
< 0.1%
57771
< 0.1%
57761
< 0.1%
57751
< 0.1%
57741
< 0.1%

customer_id
Real number (ℝ≥0)

HIGH CORRELATION
UNIQUE

Distinct5693
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean16601.49587
Minimum12347
Maximum22709
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size44.6 KiB
2022-08-22T18:27:24.168865image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum12347
5-th percentile12700.6
Q114289
median16229
Q318211
95-th percentile21732.8
Maximum22709
Range10362
Interquartile range (IQR)3922

Descriptive statistics

Standard deviation2808.146205
Coefficient of variation (CV)0.1691501914
Kurtosis-0.8215450043
Mean16601.49587
Median Absolute Deviation (MAD)1962
Skewness0.4411283657
Sum94512316
Variance7885685.106
MonotonicityNot monotonic
2022-08-22T18:27:24.450974image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
178501
 
< 0.1%
211111
 
< 0.1%
164981
 
< 0.1%
137451
 
< 0.1%
155841
 
< 0.1%
210891
 
< 0.1%
210881
 
< 0.1%
210871
 
< 0.1%
210861
 
< 0.1%
155781
 
< 0.1%
Other values (5683)5683
99.8%
ValueCountFrequency (%)
123471
< 0.1%
123481
< 0.1%
123491
< 0.1%
123501
< 0.1%
123521
< 0.1%
123531
< 0.1%
123541
< 0.1%
123551
< 0.1%
123561
< 0.1%
123571
< 0.1%
ValueCountFrequency (%)
227091
< 0.1%
227081
< 0.1%
227071
< 0.1%
227061
< 0.1%
227051
< 0.1%
227041
< 0.1%
227001
< 0.1%
226991
< 0.1%
226961
< 0.1%
226951
< 0.1%

gross_revenue
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
SKEWED

Distinct5447
Distinct (%)95.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1753.54654
Minimum0.42
Maximum279138.02
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size44.6 KiB
2022-08-22T18:27:24.780320image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum0.42
5-th percentile13.128
Q1236.09
median612.78
Q31569.11
95-th percentile5260.23
Maximum279138.02
Range279137.6
Interquartile range (IQR)1333.02

Descriptive statistics

Standard deviation7495.95906
Coefficient of variation (CV)4.27474201
Kurtosis705.9115845
Mean1753.54654
Median Absolute Deviation (MAD)478.74
Skewness23.1789984
Sum9982940.45
Variance56189402.23
MonotonicityNot monotonic
2022-08-22T18:27:25.005592image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
7.959
 
0.2%
1.258
 
0.1%
2.958
 
0.1%
4.958
 
0.1%
12.757
 
0.1%
3.757
 
0.1%
1.657
 
0.1%
5.956
 
0.1%
7.56
 
0.1%
4.256
 
0.1%
Other values (5437)5621
98.7%
ValueCountFrequency (%)
0.421
 
< 0.1%
0.651
 
< 0.1%
0.791
 
< 0.1%
0.844
0.1%
0.853
 
0.1%
1.071
 
< 0.1%
1.258
0.1%
1.441
 
< 0.1%
1.657
0.1%
1.691
 
< 0.1%
ValueCountFrequency (%)
279138.021
< 0.1%
259657.31
< 0.1%
194550.791
< 0.1%
140450.721
< 0.1%
124564.531
< 0.1%
117379.631
< 0.1%
91062.381
< 0.1%
72882.091
< 0.1%
66653.561
< 0.1%
65039.621
< 0.1%

recency_days
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION

Distinct304
Distinct (%)5.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean116.8703671
Minimum0
Maximum373
Zeros37
Zeros (%)0.6%
Negative0
Negative (%)0.0%
Memory size44.6 KiB
2022-08-22T18:27:25.313858image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile3
Q123
median71
Q3200
95-th percentile338
Maximum373
Range373
Interquartile range (IQR)177

Descriptive statistics

Standard deviation111.6209282
Coefficient of variation (CV)0.9550832341
Kurtosis-0.6403154021
Mean116.8703671
Median Absolute Deviation (MAD)61
Skewness0.8153515696
Sum665343
Variance12459.23161
MonotonicityNot monotonic
2022-08-22T18:27:25.628877image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1110
 
1.9%
4105
 
1.8%
398
 
1.7%
292
 
1.6%
1086
 
1.5%
882
 
1.4%
1779
 
1.4%
979
 
1.4%
778
 
1.4%
1566
 
1.2%
Other values (294)4818
84.6%
ValueCountFrequency (%)
037
 
0.6%
1110
1.9%
292
1.6%
398
1.7%
4105
1.8%
552
0.9%
778
1.4%
882
1.4%
979
1.4%
1086
1.5%
ValueCountFrequency (%)
37323
0.4%
37223
0.4%
37117
0.3%
3694
 
0.1%
36813
0.2%
36716
0.3%
36615
0.3%
36519
0.3%
36411
0.2%
3627
 
0.1%

qtt_invoices
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct56
Distinct (%)1.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.471807483
Minimum1
Maximum206
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size44.6 KiB
2022-08-22T18:27:25.956741image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q11
median1
Q34
95-th percentile11
Maximum206
Range205
Interquartile range (IQR)3

Descriptive statistics

Standard deviation6.814409585
Coefficient of variation (CV)1.962784405
Kurtosis301.9942725
Mean3.471807483
Median Absolute Deviation (MAD)0
Skewness13.19072233
Sum19765
Variance46.43617799
MonotonicityNot monotonic
2022-08-22T18:27:26.233170image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
12870
50.4%
2825
 
14.5%
3502
 
8.8%
4394
 
6.9%
5237
 
4.2%
6173
 
3.0%
7138
 
2.4%
898
 
1.7%
969
 
1.2%
1055
 
1.0%
Other values (46)332
 
5.8%
ValueCountFrequency (%)
12870
50.4%
2825
 
14.5%
3502
 
8.8%
4394
 
6.9%
5237
 
4.2%
6173
 
3.0%
7138
 
2.4%
898
 
1.7%
969
 
1.2%
1055
 
1.0%
ValueCountFrequency (%)
2061
< 0.1%
1991
< 0.1%
1241
< 0.1%
971
< 0.1%
912
< 0.1%
861
< 0.1%
721
< 0.1%
622
< 0.1%
601
< 0.1%
571
< 0.1%

unique_products
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct439
Distinct (%)7.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean69.69295626
Minimum1
Maximum1786
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size44.6 KiB
2022-08-22T18:27:26.540656image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile2
Q113
median36
Q385
95-th percentile241.4
Maximum1786
Range1785
Interquartile range (IQR)72

Descriptive statistics

Standard deviation101.7406736
Coefficient of variation (CV)1.459841555
Kurtosis43.87384045
Mean69.69295626
Median Absolute Deviation (MAD)28
Skewness4.703075254
Sum396762
Variance10351.16467
MonotonicityNot monotonic
2022-08-22T18:27:26.822038image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1278
 
4.9%
2149
 
2.6%
3112
 
2.0%
10101
 
1.8%
597
 
1.7%
996
 
1.7%
893
 
1.6%
693
 
1.6%
1192
 
1.6%
790
 
1.6%
Other values (429)4492
78.9%
ValueCountFrequency (%)
1278
4.9%
2149
2.6%
3112
2.0%
490
 
1.6%
597
 
1.7%
693
 
1.6%
790
 
1.6%
893
 
1.6%
996
 
1.7%
10101
 
1.8%
ValueCountFrequency (%)
17861
< 0.1%
17661
< 0.1%
13221
< 0.1%
11181
< 0.1%
11091
< 0.1%
8841
< 0.1%
8171
< 0.1%
7481
< 0.1%
7301
< 0.1%
7201
< 0.1%

unique_items
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
SKEWED

Distinct1838
Distinct (%)32.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean948.5617425
Minimum1
Maximum196844
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size44.6 KiB
2022-08-22T18:27:27.102660image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile4
Q1106
median316
Q3804
95-th percentile2913.2
Maximum196844
Range196843
Interquartile range (IQR)698

Descriptive statistics

Standard deviation4183.804245
Coefficient of variation (CV)4.410682044
Kurtosis948.0685665
Mean948.5617425
Median Absolute Deviation (MAD)253
Skewness25.19954694
Sum5400162
Variance17504217.96
MonotonicityNot monotonic
2022-08-22T18:27:27.413179image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1114
 
2.0%
273
 
1.3%
351
 
0.9%
449
 
0.9%
535
 
0.6%
629
 
0.5%
1225
 
0.4%
8822
 
0.4%
7221
 
0.4%
720
 
0.4%
Other values (1828)5254
92.3%
ValueCountFrequency (%)
1114
2.0%
273
1.3%
351
0.9%
449
0.9%
535
 
0.6%
629
 
0.5%
720
 
0.4%
818
 
0.3%
97
 
0.1%
1017
 
0.3%
ValueCountFrequency (%)
1968441
< 0.1%
802631
< 0.1%
773731
< 0.1%
699931
< 0.1%
645491
< 0.1%
641241
< 0.1%
633121
< 0.1%
583431
< 0.1%
578851
< 0.1%
502551
< 0.1%

daily_purchase_rate
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION

Distinct1226
Distinct (%)21.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.5473078999
Minimum0.005449591281
Maximum17
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size44.6 KiB
2022-08-22T18:27:27.745260image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum0.005449591281
5-th percentile0.01104159896
Q10.02492211838
median1
Q31
95-th percentile1
Maximum17
Range16.99455041
Interquartile range (IQR)0.9750778816

Descriptive statistics

Standard deviation0.5502787309
Coefficient of variation (CV)1.00542808
Kurtosis139.1642784
Mean0.5473078999
Median Absolute Deviation (MAD)0
Skewness4.859773608
Sum3115.823874
Variance0.3028066817
MonotonicityNot monotonic
2022-08-22T18:27:28.084603image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
12878
50.6%
247
 
0.8%
0.062518
 
0.3%
0.0277777777817
 
0.3%
0.0238095238116
 
0.3%
0.0833333333315
 
0.3%
0.0909090909115
 
0.3%
0.0294117647114
 
0.2%
0.0344827586214
 
0.2%
0.0769230769213
 
0.2%
Other values (1216)2646
46.5%
ValueCountFrequency (%)
0.0054495912811
 
< 0.1%
0.0054644808741
 
< 0.1%
0.0054794520551
 
< 0.1%
0.0054945054951
 
< 0.1%
0.0055865921792
< 0.1%
0.0056022408961
 
< 0.1%
0.0056179775282
< 0.1%
0.005665722381
 
< 0.1%
0.0056818181822
< 0.1%
0.0056980056983
0.1%
ValueCountFrequency (%)
171
 
< 0.1%
41
 
< 0.1%
35
 
0.1%
247
 
0.8%
1.1428571431
 
< 0.1%
12878
50.6%
0.751
 
< 0.1%
0.66666666673
 
0.1%
0.5508021391
 
< 0.1%
0.53351206431
 
< 0.1%

total_prod_returned
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
SKEWED
ZEROS

Distinct212
Distinct (%)3.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean16.6597576
Minimum0
Maximum8004
Zeros4191
Zeros (%)73.6%
Negative0
Negative (%)0.0%
Memory size44.6 KiB
2022-08-22T18:27:28.397423image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q31
95-th percentile38
Maximum8004
Range8004
Interquartile range (IQR)1

Descriptive statistics

Standard deviation166.7090233
Coefficient of variation (CV)10.00668961
Kurtosis1115.784459
Mean16.6597576
Median Absolute Deviation (MAD)0
Skewness28.80267821
Sum94844
Variance27791.89845
MonotonicityNot monotonic
2022-08-22T18:27:28.717049image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
04191
73.6%
1169
 
3.0%
2150
 
2.6%
3105
 
1.8%
489
 
1.6%
678
 
1.4%
561
 
1.1%
1252
 
0.9%
744
 
0.8%
843
 
0.8%
Other values (202)711
 
12.5%
ValueCountFrequency (%)
04191
73.6%
1169
 
3.0%
2150
 
2.6%
3105
 
1.8%
489
 
1.6%
561
 
1.1%
678
 
1.4%
744
 
0.8%
843
 
0.8%
941
 
0.7%
ValueCountFrequency (%)
80041
< 0.1%
44271
< 0.1%
37681
< 0.1%
33321
< 0.1%
28781
< 0.1%
20221
< 0.1%
20121
< 0.1%
17761
< 0.1%
15941
< 0.1%
15352
< 0.1%

Interactions

2022-08-22T18:27:19.133997image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-22T18:26:57.337919image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-22T18:26:59.343282image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-22T18:27:02.054708image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-22T18:27:04.320730image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-22T18:27:06.637955image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-22T18:27:08.546398image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-22T18:27:10.634118image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-22T18:27:14.289327image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-22T18:27:19.411390image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-22T18:26:57.605178image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-22T18:26:59.555453image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-22T18:27:02.408152image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-22T18:27:04.678856image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-22T18:27:06.833911image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-22T18:27:08.774111image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-22T18:27:10.884684image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-22T18:27:14.698854image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-22T18:27:20.158264image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-22T18:26:57.827221image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-22T18:26:59.762191image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-22T18:27:02.688622image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-22T18:27:04.902104image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-22T18:27:07.017163image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-22T18:27:08.963089image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-22T18:27:11.107256image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-22T18:27:14.979143image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-22T18:27:20.557409image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-22T18:26:58.047883image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-22T18:26:59.969695image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-22T18:27:02.916802image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-22T18:27:05.165482image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-22T18:27:07.212704image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-22T18:27:09.168912image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-22T18:27:11.446191image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-22T18:27:15.319074image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-22T18:27:20.928255image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-22T18:26:58.277261image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-22T18:27:00.197229image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-22T18:27:03.126137image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-22T18:27:05.431371image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-22T18:27:07.429607image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-22T18:27:09.397288image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-22T18:27:11.792205image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-22T18:27:15.623804image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-22T18:27:21.174734image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-22T18:26:58.477045image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-22T18:27:00.814585image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-22T18:27:03.339526image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-22T18:27:05.653774image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-22T18:27:07.633750image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-22T18:27:09.604827image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-22T18:27:12.254109image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-22T18:27:16.000696image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-22T18:27:21.470751image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-22T18:26:58.698721image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-22T18:27:01.011227image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-22T18:27:03.564560image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-22T18:27:05.901629image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-22T18:27:07.848107image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-22T18:27:09.841279image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-22T18:27:12.834378image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-22T18:27:16.531887image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-22T18:27:21.750721image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-22T18:26:58.906436image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-22T18:27:01.230608image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-22T18:27:03.797676image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-22T18:27:06.150122image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-22T18:27:08.077641image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-22T18:27:10.085282image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-22T18:27:13.110709image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-22T18:27:17.545397image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-22T18:27:22.067635image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-22T18:26:59.109368image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-22T18:27:01.664077image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-22T18:27:04.026332image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-22T18:27:06.391989image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-22T18:27:08.309993image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-22T18:27:10.360710image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-22T18:27:13.940577image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-22T18:27:18.238955image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Correlations

2022-08-22T18:27:28.960769image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2022-08-22T18:27:29.290632image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2022-08-22T18:27:29.605850image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2022-08-22T18:27:29.886244image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2022-08-22T18:27:22.506043image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
A simple visualization of nullity by column.
2022-08-22T18:27:23.064046image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

First rows

df_indexcustomer_idgross_revenuerecency_daysqtt_invoicesunique_productsunique_itemsdaily_purchase_ratetotal_prod_returned
00178505391.21372.034.021.01733.017.00000040.0
11130473232.5956.09.0105.01390.00.02830235.0
22125836705.382.015.0114.05028.00.04032350.0
3313748948.2595.05.024.0439.00.0179210.0
4415100876.00333.03.01.080.00.07317122.0
55152914623.3025.014.061.02102.00.04011529.0
66146885630.877.021.0148.03621.00.057221399.0
77178095411.9116.012.046.02057.00.03352041.0
881531160767.900.091.0567.038194.00.243316474.0
99160982005.6387.07.034.0613.00.0243900.0

Last rows

df_indexcustomer_idgross_revenuerecency_daysqtt_invoicesunique_productsunique_itemsdaily_purchase_ratetotal_prod_returned
56835774227004839.421.01.055.01074.01.00.0
5684577513298360.001.01.02.096.01.00.0
5685577614569227.391.01.010.079.01.00.0
568657772270417.901.01.07.014.01.00.0
56875778227053.351.01.02.02.01.00.0
56885779227065699.001.01.0634.01747.01.00.0
56895780227076756.060.01.0730.02010.01.00.0
56905781227083217.200.01.056.0654.01.00.0
56915782227093950.720.01.0217.0731.01.00.0
5692578312713794.550.01.037.0505.01.00.0